Development of an early alert model for pandemic situations in Germany

The COVID-19 pandemic has pointed out the need for new technical approaches to increase the preparedness of healthcare systems. One important measure is to develop innovative early warning systems. Along those lines, we first compiled a corpus of relevant COVID-19 related symptoms with the help of a disease ontology, text mining and statistical analysis. Subsequently, we applied statistical and machine learning (ML) techniques to time series data of symptom related Google searches and tweets spanning the time period from March 2020 to June 2022. In conclusion, we found that a long-short-term memory (LSTM) jointly trained on COVID-19 symptoms related Google Trends and Twitter data was able to accurately forecast up-trends in classical surveillance data (confirmed cases and hospitalization rates) 14 days ahead. In both cases, F1 scores were above 98% and 97%, respectively, hence demonstrating the potential of using digital traces for building an early alert system for pandemics in Germany.


Coronavirus disease 2019 (COVID-19
) is an infectious disease caused by the severe acute respiratory syndrome coronavirus type 2 (SARS-CoV-2) that emerged in December 2019.It was discovered in Wuhan, China, and quickly spread over the world, including Europe and the United States.In early 2020, almost three million positive cases were identified worldwide 1 .At the time of the first outbreak in China, many healthcare systems around the world were not well prepared for the pandemic.Measures to prevent its spread to other regions of the world were often hesitant and taken too late.In the light of this situation one of the missions of the French-German collaborative project AIOLOS (Artificial Intelligence Tools for Outbreak Detection and Response (https:// aiolos-proje ct.org)) is to implement modeling approaches that could support the development of an early warning system for pandemic situations.
Public health surveillance data, such as confirmed cases, hospitalizations and deaths, are critical for understanding disease epidemiology 2 .One big challenge, though, is that many countries do not always record traditional surveillance data in a fully automated and digital way in real time.Hence, there is a systematic and critical delay between reported surveillance and the real disease spread.However, time is a critical element for taking effective countermeasures against epidemic or pandemic situations 3,4 .In response to this situation, Westhaus et al. proposed systematic wastewater monitoring 5 , and according measures have been implemented in the USA, Australia, Israel, Spain, Italy and several French regions during the COVID-19 pandemic [6][7][8][9][10][11][12][13][14][15] .However, wastewater monitoring is expensive and requires knowledge of the specific virus particles that are searched for.
A principle alternative is the use of social media and other digital traces of people's online activities as a source of disease associated information.The underlying hypothesis is that people tend to search for signs of illness and post about them in social networks before confirmed cases are reported in traditional surveillance data.Hence, information may spread faster through digital traces compared to traditional channels.Over the last decade, several studies have explored this idea, for example, by leveraging Google searches and Twitter posts for predicting and tracking the spread of diseases such as the flu, dengue, Zika, MERS, Ebola and COVID-19 2,[16][17][18][19][20][21][22] .We refer to Botz et al. 23 for a more detailed review.For example, Kogan et al. used a Bayesian model to develop an early warning algorithm for COVID-19 based on several data sources (Google Trends, Twitter, UpToDate), fever incidence rates, and predictions made by the global epidemic and mobility model in the USA 2 .The algorithm was validated on US COVID-19 surveillance data as well as incidence rates of influenza-like illness, demonstrating that an uptrend in COVID-19 infections could be predicted up to 7 days in advance with a sensitivity of 0.75.
To narrow down the extensive list of potential search terms, we aimed to identify the most significant ones by assessing their frequency of appearance in PubMed and PMC articles compared to what would be expected by chance.To accomplish this, we conducted a hypergeometric test for each term 27 .The test was performed as follows: Let M be the total number of documents in SCAIView search engine, and n be the number of documents containing COVID-19-related terms.Furthermore, let N be the number of documents that contain a certain symptom/ synonym q.Finally, let x be the number of documents that contain both, COVID-19-related terms and q, i.e. the cardinality of the intersection of both sets.Then the probability of seeing an intersection at least as large as x is given by (1) The proposed workflow of developing an early alert model using digital traces from Google Trends and Twitter.

Google trends
Google allows all users to access and process anonymized data on relative search volume behavior with Google Trends.Using the Google Trends API, we obtained a daily time series of the search frequencies for the German symptom keyword through the Pytrends library (version 4.9.0).The TrendReq method was used to pass host language ("de-DE") and timezone offset (60) in the initialization step.We queried each symptom keyword term in a measure entitled "interest_over_time". Parameters such as "start_year" (2020), "start_mon" (1), "stop_year" (2022), "stop_mon" (6), and "geo" ("DE") were given in the function.The relative search result for a given keyword is between 0 and 100.A value of 0 indicates the lowest relative search interest for the given keyword, whereas a value of 100 indicates the date with the maximum search interest.For consistency, the daily data values are scaled by multiplying the daily value by the number of searches per month divided by 100 (https:// github.com/ Gener alMil ls/ pytre nds/ tree/ master/ pytre nds, accessed on June 15, 2022).The Google Trends dataset includes data from January 5, 2020 to June 28, 2022.We eliminated symptoms which consistently showed no Google queries throughout the entire time span.

Twitter microblogs
The study leveraged Twitter's streaming API and Tweepy library (version 4.8.0) to collect Twitter data.The streaming API allows for a stream of public tweets from the platform in real time that can be filtered using multiple parameters.Credentials of Academic Developer Portal are applied for getting the full-archive daily tweets counts using "get_all_tweets_count" in the case of a client application with the support of Pagination.Pagination programmatically retrieves the entire result set by setting "flatten" with a larger limit.In our study, each German symptom keyword and language ("DE") were specified at the query part.The Twitter dataset contains the daily search amount for certain German symptoms between January 1, 2020 and June 28, 2022.Similar to the situation with Google Trends, we removed symptoms for which no tweets could be found.

Surveillance data
In this study, we obtained daily COVID-19 case reports from Robert Koch-Institute (RKI) website as the groundtruth, including RKI confirmed cases, RKI deaths (https:// github.com/ robert-koch-insti tut, accessed on March 24, 2023), and RKI hospitalization (https:// github.com/ robert-koch-insti tut/ COVID-19-Hospi talis ierun gen_ in_ Deuts chland, accessed on February 22, 2023).The time series of surveillance data started on March 1, 2020, and we sliced it till June 28, 2022 for further analysis.Overall, it provided insight into the changing trends and patterns of the monitored area over a two and a half year period.Data has been normalized based on the number of corresponding cases per 100,000 people.

Trend decomposition
The occurrence of symptom mentioning in social media and Google searches may be confounded by natural seasonal fluctuations, resulting in an elevated level of false positive alarms.To remove unwanted such fluctuations, we thus applied robust time series decomposition techniques, more specifically Seasonal and Trend decomposition using LOESS (STL), a non-parametric method developed by Cleveland et al. 29 .STL has the strengths of simplicity and effectiveness when dealing with time series data as each component is allowed to change over time, and the system is robust to outliers 30 .
STL is a filtering procedure which is used for decomposing a seasonal time series into three components, which are called trend, seasonal, and remainder.The supposed raw data, the trend component, the so-called seasonal component, and the remainder component are denoted by Y v , T v , S v , and R v , respectively.For v = 1 to N: STL decomposition consists the following two recursive procedures: an inner loop and an outer loop.The inner loop updates the seasonal and trend components with seasonal smoothing and trend smoothing, which is handled by locally-weighted regression (LOESS) 31 .The procedure consists of six iterative steps as follows 32 : (1) Detrending: The detrended time series can be calculated via the formula , where k denotes the iteration.(2) Cycle-Subseries Smoothing: To smooth the subseries of detrended data and obtain the smoothing result, which is shown as C  6) Trend Smoothing: LOESS is used to smooth the deseasonalized series and to get the trend component of current pass: In the outer loop a robustness weight ( ρ v ) is defined, which reflects how extreme the remainder component is weighted: Vol:.( 1234567890 www.nature.com/scientificreports/ρ v is adjusted to reduce the influence of outliers in Cycle-Subseries and Trend Smoothing.We used the "STL" function in Python statsmodels library (version 0.13.5) to decompose the raw time series data, and "period" parameter was specified to adjust the seasonal part.For surveillance data, the "period" parameter was set to 7, and for digital traces time series data, the value was 30.

Log-linear regression model
We assume that pandemic events exhibit exponential behavior within a relatively short time frame.Our study thus employs a log-linear regression model to track the growth and decline of surveillance and digital traces in consecutive 14-day intervals throughout the training period, which spans from February 2020 to February 2022.This duration encompasses various phases of the pandemic.
In Eq. 3, the independent variable is the "time", more specifically, the days in our case, α is the estimated inter- cept, β is the slop coefficient, ε t represents the Gaussian noise term with a mean of 0 and a variance of σ 2 , and Y t represents the trend component at time t, which has previously been extracted via STL decomposition.The log transformed surveillance and digital time series data is fitted against days over the respective 14-days interval (sliding window).
We evaluated the statistical significance of coefficient β within each sliding window followed by an adjustment to P values using the Holm-Bonferroni method (5 % significance level), i.e. log(Y t 0 :t 13 ) was the first window.We determined the trend given β and the respective P value for every interval.When β exceeded 0 and the adjusted P value was less than 0.05, an exponential growth ("up-trend") was declared, whereas an exponential decay ("down-trend") was defined when β was less than 0 and the adjusted P value less than 0.05.For other situations, we declared "no trend".This procedure was repeated on successive days to obtain a sequence of adjusted P values and trends.The illustration of the log-linear regression model is shown in Fig. 2. For subsequent analysis we always assigned β to the first time point of the sliding window.Since the sliding window has a stride of 1 day, we were thus able to convert the original time series into a time series of β coefficients (i.e.slopes), one per day, except for the last 13 days.

Multi-symptom P value
To obtain a comprehensive metric for combining multiple symptoms of each digital data trace, the harmonic P (HMP) value approach was utilized, as described in study 2,33 .The calculation involved combining P values estimated for each symptom/synonym.The formula used for this calculation is as follows: where w i were weights that sum to 1, and each symptom/synonym was treated equally.Following that, P values were adjusted for multiple testing using the Holm-Bonferroni method at a significance level of 5% 28 .In particular, we voted for the trend across multiple symptoms per day, i.e. an "up-trend" was declared only if the majority of symptoms demonstrated an up-trend and the adjusted P value was less than 0.05.We assigned a "down-trend" using the same strategy, otherwise a "no-trend" was declared.
In case of harmonizing digital traces, the metric was calculated by combining the P value estimates across Google Trends and Twitter.

Evaluating early alerts in digital traces
To quantify the performance of digital traces as potential early indicators of the onsets of up-trends in surveillance data, we defined the true-positive rates (TPR), false-positive rates (FPR) and false-negative rates (FNR) of the up-trends in agreement to Kogan et al. 2 .Note that the evaluation metrics were calculated during February 2020 to February 2022, and only the first detected up-trend in digital traces and surveillance data was considered when the same trend occurred for a period of time before a different event (down-or no-trends) was observed.
Illustration of fitting the log-linear regression model and identifying the corresponding trend of each date.Surveillance and digital traces data is fitted within a successive 14-days time interval (sliding window).For each sliding window, the statistically significant coefficient and adjusted P values are calculated, e.g.An "up-trend" (1) is given for the sliding window as the coefficient β is larger than 0 and the adjusted P value is less than 0.05 threshold.www.nature.com/scientificreports/ (1) A true-positive (TP) event was declared, if an up-trend in the digital data source (an alert) fell within a 30 days window ahead of the onset of up-trends detected in surveillance data (confirmed cases, hospitalization, and deaths).The 30 days were counted from the beginning of each sliding window.(2) A false-negative (FN) event was declared, if an up-trend onset was detected in the surveillance data, but no alert was found in the digital data source within the 30 days time window ahead of this event.
(3) A false-positive (FP) was declared, if an alert in the digital data source was found, but no up-trend was detected within a 30 days time window in the surveillance data afterwards.Given these definitions, sensitivity, precision, and F1 score could be calculated.More specifically, the sensitivity is the ratio between the TP events ("successful alerts") and the sum of TP and FN events (the number of up-trend onsets in the surveillance gold standard), and the precision is the ratio between TP events and the sum of TP and FP events (the number of alerts observed in digital traces).Evaluation metrics for down-trends can be calculated in a likewise manner.

General setup
The general trend forecasting procedure performed is presented in Fig. 3. Trend forecasting was carried out using the same symptom corpus as during the trend analysis.We first processed the datasets using the sliding window approach, followed by separating into training and testing subsets.Next, we generated Random Forest and LSTM models and evaluated their performance.In order to tune the hyperparameters, time series cross-validation was utilized within the training set.The optimized forecasting model was evaluated on the independent test set by sensitivity, precision, and F1 score.

Learning from the time series
As outlined above, a multivariable longitudinal dataset was generated combining the time-stamped slope coefficients of all log-linear regression models, hence reflecting the short-term trends in the original time series data.This data was used to train machine learning models.The time period of the overall dataset ranged from March 2020 to June 2022.
We employed a sliding window approach to collect information on temporal dependence, where the window size was set to 28 days.The sliding window stride was 1 day.
Note that we had previously assigned a time-stamp to each log-linear regression slope coefficient based on the trend observed in the subsequent 14 days.To avoid overoptimism of model predictions we thus assigned a class label to each sliding window according to the type of trend (up-, down-, no-trend) observed in the surveillance data 14 days after the end of the sliding window in digital traces.That means the effective forecasting horizon of our models was set to 14 days.

Time series cross-validation
The sktime library (version 0.14.1)provides "SingleWindowSplitter" class to split the training set into temporal folds with constant size.The corresponding training subset of each fold consisted only of observations that occurred before the observation that forms the validation subset.During the time series cross-validation, we set the length of the training period to 90 days/fold.Note that over each of those 90 days we subsequently ran a sliding window to generate training samples, as described earlier.Moreover, we specified the test subsets of 30 days and the stride of 70 days to generate temporal dataset frames.
We avoided data leakage caused between training and test set by setting a "gap" between them.In this case, the gap was equal to the 14-days sliding window size in log-linear regression model.The following diagram illustrates the concept of time series cross-validation, where the blue observations form the training subsets and the orange observations create the test subsets (Fig. 4).

Hyperparameter optimization
Within the time series cross-validation, we automated the process of hyperparameter-tuning, for which we used the Optuna framework (version 2.0.0) 34.
The range of hyperparameters of Random Forest, including the number of estimators, minimal samples split, max depth range, minimal samples leaf, and maximal number of features, are listed in Supplementary Table S1.For LSTM, we used the Adam optimizer.The other hyperparameters, for instance, the number of epochs, batch size, number of hidden units, number of layers, learning rate and dropout rate, are shown in Supplementary Table S2.
During the hyperparameter tuning, we maximized the validation accuracy for Random Forest and minimize the validation Cross Entropy Loss for LSTM.The Tree-structured Parzen Estimator (TPE) 35 was defined as sampler.The number of trials was set to 90.Finally, we retrained the model with the suggested hyperparameters and used the hyperparameter-tuned model to conduct the rest of our study.

Evaluation of trend decomposition
We initially evaluated the applicability of the STL method to forecast the extracted trends from the raw data.Specifically, we utilized the "STL forecast" function from Python's statsmodels library (version 0.13.5) to extend time series data from an interval [1, t] to a future time point t + k , represented as T ν (t + k) = T ν (1 : t) .The "STL forecast" model was applied to the training data, covering the period from February 2020 to May 2022, and we extrapolated the digital trace time series extending from June 01, 2022, to June 28, 2022.Subsequently, we conducted the STL decomposition on the extrapolated data to derive the corresponding trend component.Through this process, we assessed the correlation between the extrapolated trend and the trend component for the same time period, which was extracted from a STL decomposition of the entire dataset, i.e. training plus test for each of the top 20 symptoms previously listed in Fig. 5 (see Supplementary Table S3 for a complete list and their German translations).The outcomes of this analysis clearly highlighted the robustness of the STL decomposition algorithm with correlations close to 1 (Supplementary Table S4), also in terms of the agreement of significant up-trends in the entire vs. forecasted data for different digital traces (Google Trends, Twitter, and combined of both), see Supplementary Table S5).

Symptom-level up-trends in digital traces precede up-trends in surveillance data
Given that our study aims to provide early warning indicators of pandemics in terms of confirmed cases, hospitalization and deaths, we examined whether up-trends in digital traces preceded those in established surveillance data on the level of individual symptoms (Google Trends (168 symptoms), Twitter (204 symptoms)).For this purpose, we first performed a statistical trend analysis in surveillance data from March 2020 to February 2022 as explained in the Methods Section.Then we calculated the sensitivity, precision, and F1 score of each individual symptom based on whether the onsets of up-trends derived from digital traces fell within a 30-day time window ahead of the onsets of up-trends in surveillance data 2 .Supplementary Table S6 and S7 present the evaluation metrics for the top 20 symptoms from each digital data trace (Google Trends and Twitter) with descending F1 scores for tracking the onsets of up-trends of confirmed cases, deaths, and hospitalization.
Overall these findings indicate that certain symptoms mentioned in different digital traces precede up-trends in classical surveillance data and may thus be employed to develop an early alert indicator.

Harmonizing multiple symptoms and digital traces into a combined indicator
We next sought to further investigate whether the combination of several symptoms from each digital trace (Google Trends and Twitter) could lead to more accurate detection of up-trends in surveillance data.For this purpose, we focused on the 20 most significant terms according to the hypergeometric test.More specifically, for each symptom ontology, we selected the synonym with the highest search volume in Google Trends and Twitter during the period from February 2020 to February 2022.We then combined these synonyms using the harmonic mean P-value (HMP) method 33 .Subsequently, we evaluated the alert detection performance of each digital trace relative to outbreaks in the surveillance data in the same way as described before.
As shown in Table 1, when tracking confirmed cases Google Trends provided an F1 score of 0.5.For Twitter, the F1 score was 0.47.The same metrics to hospitalization and death were considerably weaker around 0.38 or even lower.Furthermore, when harmonizing Google Trends and Twitter into a combined trace, the performance was improved when preceding confirmed cases (an F1 score of 0.59), but only 0.27 when tracking deaths.Based on this, we concluded that (a) digital traces could not be used to construct a reliable early warning indicator for deaths, and (b) a combination of Google Trends and Twitter holds most promises as an early indicator for incident cases and hospitalization.Hence, we focused our subsequent analysis on this combined digital trace.
Table 1.Sensitivity, Precision, and F1 Score for different digital traces (Google Trends, Twitter, and combined of both) as an early indicator for an onset of "up-trends" in COVID-19 gold standards (Confirmed cases, Deaths, and Hospitalization).www.nature.com/scientificreports/

Quantifying the benefit of combined digital trace
To better understand the potential benefit of Google Trends and the combined digital trace as early warning indicators compared to traditional surveillance data it is imperative to quantify the amount of time that trends in both traces precede those in incident cases and hospitalization.For this purpose, we considered the period from February 2, 2020 to February 28, 2022 in digital traces, and counted, how many days the alerts (i.e. the onsets of significant up-trends) in digital traces preceded the first significant up-trends in surveillance data (confirmed cases: March 3, 2020; hospitalization: March 2, 2020; deaths: March 5, 2020) (Fig. 6).Accordingly, we observed that the median time lag between onsets of up-trends in confirmed cases and alerts of Google Trends as well as the combined digital trace was 15 days.The median time lag between hospitalization vs. combined trace was 6 days.For hospitalization vs. Google Trends alone the median time lag was only 1 day (Fig. 7).Noteworthy, these time lags were counted as days from the end of each sliding window.
As depicted in Fig. 7, we conducted a similar analysis for down-trends, emphasizing the alert capabilities of both Google Trends and the combined data trace.For instance, there was a median time lag of 5 days between confirmed cases and Google Trends, while for hospitalization vs. the combined trace, the median time lag was 4 days.Altogether, our analysis shows that trends observed in digital data traces have the potential to serve as an early warning indicator, which could help to save critical time for preparation of governmental counter-measures.

Trend forecasting via machine learning
Due to the identified statistical association between digital traces, i.e.Google Trends and its combination with Twitter, and surveillance data we next explored, in how far up-and down-trends in the surveillance data could be forecasted via a machine learning classifier, once again using a sliding window approach.Whereas the previously discussed analysis only focused on identifying current trends in digital traces, the aim of such a model was thus to build a predictor for future events.The forecast horizon was set to 14 days, i.e. the aim of the algorithm was to predict the trend in confirmed cases and hospitalization 14 days ahead.It is worth noting that setting any smaller forecast horizon would result into overoptimism, because trends in digital traces had previously been calculated over a 14 days time period via the log-linear regression model, and in reality we cannot forecast trends less than 14 days ahead.The setting of forecasting horizon specifically applies to the last time point in each sliding window.
Table 2 summarizes the sensitivity, precision, and F1 scores of Random Forest and LSTM models for predicting trends of confirmed cases and hospitalization (from end of March 2022 to June 2022).During the training procedure, we performed a 9-fold time series cross-validation scheme to tune hyperparameters.The suggested hyperparameter values of the best-performing Random Forest and LSTM models are shown in Supplementary Table S8.The result indicates a better prediction performance of the LSTM model, which reached an F1 of 0.98 for up-trend forecasting of confirmed case and an F1 score of 0.97 for up-trend forecasting of hospitalization using the combination of Google Trends plus Twitter (2b and 2d).Besides, a down-trend in confirmed cases as well as hospitalizations was equally well predicted by F1 scores of 0.91 and 0.96, respectively.Considering all three possible classes (up-trend, down-trend, no-trend) a weighted average F1 of 0.95 for confirmed cases and 0.94 for hospitalization could be achieved.

Interpretation of trend forecasting model
We performed an analysis of the best performing LSTM models for predicting incident cases as well as hospitalization via Shapley Additive Explanations (SHAP) 36 to understand the influence of individual Google search and Twitter terms for model predictions of up-trends.Figure 8a and b depict the bar plots of predictive symptoms with their descending mean absolute SHAP values.For instance, Hypoxämie (hypoxemia) from Google Trends & Twitter, Kopfschmerzen (headache) and Muskelschmerzen (muscle pain) from Google Trends, trockener Husten (dry cough) and Übelkeit (nausea) from Twitter were indicative of up-trends in confirmed cases and hospitalization.Moreover, respiratorische Insuffizienz (respiratory insufficiency), Pneumonie (pneumonia) and Müdigkeit (fatigue) from Google Trends were relevant symptom for forecasting up-trends in confirmed cases.The same pattern was observed in Schock (shock) (Twitter), Angst (anxiety) and Sputum (sputum) (Google Trends) for predicting hospitalization up-trends.These findings are generally in line with the previously presented statistical trend analysis of individual symptoms.

Discussion
This work explored the possibility to use digital traces, in particular Google Trends and Twitter, in Germany to develop an early alert indicator and trend forecasting model.Existing work going into a similar direction has previously only been published for countries other than Germany 2,37-46 .Since the use of social media and internet searches is likely influenced by socio-economic and cultural background, we believe that a careful evaluation of according approaches for different regions of the world is necessary, and in this sense our work fills a gap.Furthermore, existing articles only focused on a limited set of ad-hoc query terms without a systematic way to identify them based on automated literature mining, as performed in our work.In contrast, our work resulted into a complete COVID-19-symptom corpus.
After transforming the mentioning of symptoms into a time series, we came up with a consensus indicator using log-linear regression, which we statistically evaluated against observed trends in gold standard surveillance data.We discovered that Google Trends and the combination with Twitter demonstrated a significant correlation with up-trends in surveillance data.These findings are consistent with the conclusion of the study by Kogan et al.  that harmonizing digital data traces could anticipate changes in COVID-19 related surveillance data 2 .In light of the strength of the statistical association and the observed time lag between digital traces and surveillance data, we further examined the ability to forecast future trends in the surveillance data.Here, we built, evaluated, and compared Random Forests and LSTM models, leading to the following findings: (1) The models built on a combination of Google Trends and Twitter outperformed those that were built only using Google Trends.

Data availability
Symptom Ontology data can be retrieved from EBI Symptom Ontology (https:// www.ebi.ac.uk/ ols/ ontol ogies/ symp), Surveillance data can be retrieved from Robert Koch-Institut (RKI) GitHub repository (https:// github.com/ orgs/ robert-koch-insti tut/ repos itori es), and digital trace data can be retrieved via the corresponding API tools.The data needed to evaluate the results are presented in the paper or supplementary documents.The code of the model is publicly available at https:// github.com/ danqi 123/ pande mic_ alert_ model_ social_ media.

Figure 3 .
Figure 3. Illustration of the framework for trend forecasting.

18 ,
2022 to March 17, 2022; April 27, 2022: the sliding window from March 31, 2022 to April 27, 2022; June 1, 2022: the sliding window from May 5, 2022 to June 1, 2022.The "gap" between training and test sets was equal to the effective forecasting horizon of 14 days as mentioned before.

Figure 4 .
Figure 4. Illustration of time series cross-validation.

Figure 5 .
Figure 5. Ranking of the top 20 English symptoms based on the frequency of their co-occurrence with COVID-19 related terms ("COVID", "Coronavirus disease", "COVID 19") in PubMed/PMC.The x represents the primary symptom terms with adjusted P values less than 0.001, excluding their synonyms.

Figure 6 .
Figure 6.Visualization of the up-and down-trends procedure applied to COVID-19 surveillance, Google Trends, and the combined trace over the period from February 2020 to February 2022.The up-and downtrends were detected by setting a significance level of 5% over the multiple testing corrected P values of the log-linear regression model coefficient β .The adjusted P values are shown as colored gradients, where darker red shade signifies the increased confidence of the up-trends, and darker green shade indicates the increased confidence of the down-trends.The minimum adjusted P value was 1e−30 for both up-and down-trends.The triangular markers are used to point out the date when the onsets of up-and down-trends were detected based on the significance level.Notably, as shown in the dashed line in the plot, February 2, 2020 was discovered to be an "up-trend" alert date of both traces.The first documented larger outbreak followed a carnival event in the city Heinsberg on February 15, 2020.

( 2 )Figure 7 .
Figure 7. Time lags between significant trends in surveillance data (confirmed cases and hospitalization) relative to significant trends in Google Trends and the combination of Google Trends plus Twitter during the time period February 2020 to February 2022.A negative difference shown on the x axis indicates that a significant trend in one of the digital traces preceded the observed trends in confirmed cases and hospitalization, respectively.Median time lag of confirmed cases versus Google Trends: 15 days; confirmed cases vs. Combined trace: 15 days; hospitalization versus Google Trends: 1 day; hospitalization versus Combined trace: 6 day.The number of days have been counted from the end of each sliding window.

Figure 8 .
Figure 8. SHAP values calculated for LSTM models predicting confirmed incident cases (a) and hospitalization (b), respectively.Models have been trained on combined Google Trends (GT) plus Twitter (TW) data.Only mean SHAP values for predicting up-trends are shown.